Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38683712

RESUMO

Sample selection approaches are popular in robust learning from noisy labels. However, how to control the selection process properly so that deep networks can benefit from the memorization effect is a hard problem. In this paper, motivated by the success of automated machine learning (AutoML), we propose to control the selection process by bi-level optimization. Specifically, we parameterize the selection process by exploiting the general patterns of the memorization effect in the upper-level, and then update these parameters using predicting accuracy obtained from model training in the lower-level. We further introduce semi-supervised learning algorithms to utiilize noisy-labeled data as unlabeled data. To solve the bi-level optimization problem efficiently, we consider more information from the validation curvature by the Newton method and cubic regularization method. We provide convergence analysis for both optimization methods. Results show that while both methods can converge to an (approximately) stationary point, the cubic regularization method can find better local optimal than the Newton method with less time. Experiments on both benchmark and real-world data sets demonstrate that the proposed searching method can lead to significant improvements upon existing methods. Compared with existing AutoML approaches, our method is much more efficient on finding a good selection schedule.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6231-6246, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36094971

RESUMO

Feature extractor plays a critical role in text recognition (TR), but customizing its architecture is relatively less explored due to expensive manual tweaking. In this article, inspired by the success of neural architecture search (NAS), we propose to search for suitable feature extractors. We design a domain-specific search space by exploring principles for having good feature extractors. The space includes a 3D-structured space for the spatial model and a transformed-based space for the sequential model. As the space is huge and complexly structured, no existing NAS algorithms can be applied. We propose a two-stage algorithm to effectively search in the space. In the first stage, we cut the space into several blocks and progressively train each block with the help of an auxiliary head. We introduce the latency constrain into the second stage and search sub-network from the trained supernet via natural gradient descent. In experiments, a series of ablation studies are performed to better understand the designed space, search algorithm, and searched architectures. We also compare the proposed method with various state-of-the-art ones on both hand-written and scene TR tasks. Extensive results show that our approach can achieve better recognition performance with less latency. Code is avaliable at https://github.com/AutoML-Research/TREFE.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 1458-1473, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35254979

RESUMO

Learning embeddings for entities and relations in knowledge graph (KG) have benefited many downstream tasks. In recent years, scoring functions, the crux of KG learning, have been human designed to measure the plausibility of triples and capture different kinds of relations in KGs. However, as relations exhibit intricate patterns that are hard to infer before training, none of them consistently perform the best on benchmark tasks. In this paper, inspired by the recent success of automated machine learning (AutoML), we search bilinear scoring functions for different KG tasks through the AutoML techniques. However, it is non-trivial to explore domain-specific information here. We first set up a search space for AutoBLM by analyzing existing scoring functions. Then, we propose a progressive algorithm (AutoBLM) and an evolutionary algorithm (AutoBLM+), which are further accelerated by filter and predictor to deal with the domain-specific properties for KG learning. Finally, we perform extensive experiments on benchmarks in KG completion, multi-hop query, and entity classification tasks. Empirical results show that the searched scoring functions are KG dependent, new to the literature, and outperform the existing scoring functions. AutoBLM+ is better than AutoBLM as the evolutionary algorithm can flexibly explore better structures in the same budget.

4.
Artigo em Inglês | MEDLINE | ID: mdl-36342998

RESUMO

Training deep neural networks (DNNs) typically requires massive computational power. Existing DNNs exhibit low time and storage efficiency due to the high degree of redundancy. In contrast to most existing DNNs, biological and social networks with vast numbers of connections are highly efficient and exhibit scale-free properties indicative of the power law distribution, which can be originated by preferential attachment in growing networks. In this work, we ask whether the topology of the best performing DNNs shows the power law similar to biological and social networks and how to use the power law topology to construct well-performing and compact DNNs. We first find that the connectivities of sparse DNNs can be modeled by truncated power law distribution, which is one of the variations of the power law. The comparison of different DNNs reveals that the best performing networks correlated highly with the power law distribution. We further model the preferential attachment in DNNs evolution and find that continual learning in networks with growth in tasks correlates with the process of preferential attachment. These identified power law dynamics in DNNs can lead to the construction of highly accurate and compact DNNs based on preferential attachment. Inspired by the discovered findings, two novel applications have been proposed, including evolving optimal DNNs in sparse network generation and continual learning tasks with efficient network growth using power law dynamics. Experimental results indicate that the proposed applications can speed up training, save storage, and learn with fewer samples than other well-established baselines. Our demonstration of preferential attachment and power law in well-performing DNNs offers insight into designing and constructing more efficient deep learning.

5.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6153-6168, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34061741

RESUMO

In real-world applications, it is important for machine learning algorithms to be robust against data outliers or corruptions. In this paper, we focus on improving the robustness of a large class of learning algorithms that are formulated as low-rank semi-definite programming (SDP) problems. Traditional formulations use the square loss, which is notorious for being sensitive to outliers. We propose to replace this with more robust noise models, including the l1-loss and other nonconvex losses. However, the resultant optimization problem becomes difficult as the objective is no longer convex or smooth. To alleviate this problem, we design an efficient algorithm based on majorization-minimization. The crux is on constructing a good optimization surrogate, and we show that this surrogate can be efficiently obtained by the alternating direction method of multipliers (ADMM). By properly monitoring ADMM's convergence, the proposed algorithm is empirically efficient and also theoretically guaranteed to converge to a critical point. Extensive experiments are performed on four machine learning applications using both synthetic and real-world data sets. Results show that the proposed algorithm is not only fast but also has better performance than the state-of-the-arts.

6.
IEEE Trans Neural Netw Learn Syst ; 32(2): 788-798, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-32275614

RESUMO

A least squares support vector machine (LS-SVM) offers performance comparable to that of SVMs for classification and regression. The main limitation of LS-SVM is that it lacks sparsity compared with SVMs, making LS-SVM unsuitable for handling large-scale data due to computation and memory costs. To obtain sparse LS-SVM, several pruning methods based on an iterative strategy were recently proposed but did not consider the quantity constraint on the number of reserved support vectors, as widely used in real-life applications. In this article, a noniterative algorithm is proposed based on the selection of globally representative points (global-representation-based sparse least squares support vector machine, GRS-LSSVM) to improve the performance of sparse LS-SVM. For the first time, we present a model of sparse LS-SVM with a quantity constraint. In solving the optimal solution of the model, we find that using globally representative points to construct the reserved support vector set produces a better solution than other methods. We design an indicator based on point density and point dispersion to evaluate the global representation of points in feature space. Using the indicator, the top globally representative points are selected in one step from all points to construct the reserved support vector set of sparse LS-SVM. After obtaining the set, the decision hyperplane of sparse LS-SVM is directly computed using an algebraic formula. This algorithm only consumes O(N2) in computational complexity and O(N) in memory cost which makes it suitable for large-scale data sets. The experimental results show that the proposed algorithm has higher sparsity, greater stability, and lower computational complexity than the traditional iterative algorithms.

7.
Artigo em Inglês | MEDLINE | ID: mdl-32224456

RESUMO

Convolutional sparse coding (CSC) can learn representative shift-invariant patterns from multiple kinds of data. However, existing CSC methods can only model noises from Gaussian distribution, which is restrictive and unrealistic. In this paper, we propose a generalized CSC model capable of dealing with complicated unknown noise. The noise is now modeled by Gaussian mixture model, which can approximate any continuous probability density function. We use the expectation-maximization algorithm to solve the problem and design an efficient method for the weighted CSC problem in maximization step. The crux is to speed up the convolution in the frequency domain while keeping the other computations involving weight matrix in the spatial domain. Besides, we simultaneously update the dictionary and codes by nonconvex accelerated proximal gradient algorithm without bringing in extra alternating loops. The resultant method, called generalized convolutional sparse coding (GCSC), obtains the same space complexity and a smaller running time compared with existing CSC methods. Extensive experiments on synthetic and real-world noisy data sets validate that GCSC can model noise effectively and obtain high-quality filters and representations.

8.
IEEE Trans Neural Netw Learn Syst ; 30(11): 3517-3527, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31403443

RESUMO

Many machine learning problems involve learning a low-rank positive semidefinite matrix. However, existing solvers for this low-rank semidefinite program (SDP) are often expensive. In this paper, by factorizing the target matrix as a product of two matrices and using a Courant penalty to penalize for their difference, we reformulate the SDP as a biconvex optimization problem. This allows the use of multiconvex optimization techniques to define simple surrogates, which can be minimized easily by block coordinate descent. Moreover, while traditionally this biconvex problem approaches the original problem only when the penalty parameter is infinite, we show that the two problems are equivalent when the penalty parameter is sufficiently large. Experiments on a number of SDP applications in machine learning show that the proposed algorithm is as accurate as other state-of-the-art algorithms, but is much faster, especially on large data sets.

9.
IEEE Trans Pattern Anal Mach Intell ; 41(11): 2628-2643, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30040624

RESUMO

Low-rank modeling has many important applications in computer vision and machine learning. While the matrix rank is often approximated by the convex nuclear norm, the use of nonconvex low-rank regularizers has demonstrated better empirical performance. However, the resulting optimization problem is much more challenging. Recent state-of-the-art requires an expensive full SVD in each iteration. In this paper, we show that for many commonly-used nonconvex low-rank regularizers, the singular values obtained from the proximal operator can be automatically threshold. This allows the proximal operator to be efficiently approximated by the power method. We then develop a fast proximal algorithm and its accelerated variant with inexact proximal step. It can be guaranteed that the squared distance between consecutive iterates converges at a rate of $O(1/T)$O(1/T), where $T$T is the number of iterations. Furthermore, we show the proposed algorithm can be parallelized, and the resultant algorithm achieves nearly linear speedup w.r.t. the number of threads. Extensive experiments are performed on matrix completion and robust principal component analysis. Significant speedup over the state-of-the-art is observed.

10.
IEEE Trans Image Process ; 27(10): 4850-4859, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29969396

RESUMO

Convolutional sparse coding (CSC) improves sparse coding by learning a shift-invariant dictionary from the data. However, most existing CSC algorithms operate in the batch mode and are computationally expensive. In this paper, we alleviate this problem by online learning. The key is a reformulation of the CSC objective so that convolution can be handled easily in the frequency domain, and much smaller history matrices are needed. To solve the resultant optimization problem, we use the alternating direction method of multipliers (ADMMs), and its subproblems have efficient closed-form solutions. Theoretical analysis shows that the learned dictionary converges to a stationary point of the optimization problem. Extensive experiments are performed on both the standard CSC benchmark data sets and much larger data sets such as the ImageNet. Results show that the proposed algorithm outperforms the state-of-the-art batch and online CSC methods. It is more scalable, has faster convergence, and better reconstruction performance.

11.
IEEE Trans Neural Netw Learn Syst ; 29(4): 1120-1131, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-28212101

RESUMO

The semisupervised least squares support vector machine (LS-S3VM) is an important enhancement of least squares support vector machines in semisupervised learning. Given that most data collected from the real world are without labels, semisupervised approaches are more applicable than standard supervised approaches. Although a few training methods for LS-S3VM exist, the problem of deriving the optimal decision hyperplane efficiently and effectually has not been solved. In this paper, a fully weighted model of LS-S3VM is proposed, and a simple integer programming (IP) model is introduced through an equivalent transformation to solve the model. Based on the distances between the unlabeled data and the decision hyperplane, a new indicator is designed to represent the possibility that the label of an unlabeled datum should be reversed in each iteration during training. Using the indicator, we construct an extended candidate set consisting of the indices of unlabeled data with high possibilities, which integrates more information from unlabeled data. Our algorithm is degenerated into a special scenario of the previous algorithm when the extended candidate set is reduced into a set with only one element. Two strategies are utilized to determine the descent directions based on the extended candidate set. Furthermore, we developed a novel method for locating a good starting point based on the properties of the equivalent IP model. Combined with the extended candidate set and the carefully computed starting point, a fast algorithm to solve LS-S3VM quasi-optimally is proposed. The choice of quasi-optimal solutions results in low computational cost and avoidance of overfitting. Experiments show that our algorithm equipped with the two designed strategies is more effective than other algorithms in at least one of the following three aspects: 1) computational complexity; 2) generalization ability; and 3) flexibility. However, our algorithm and other algorithms have similar levels of performance in the remaining aspects.

12.
IEEE Trans Neural Netw Learn Syst ; 26(3): 444-57, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25720002

RESUMO

When the amount of labeled data are limited, semisupervised learning can improve the learner's performance by also using the often easily available unlabeled data. In particular, a popular approach requires the learned function to be smooth on the underlying data manifold. By approximating this manifold as a weighted graph, such graph-based techniques can often achieve state-of-the-art performance. However, their high time and space complexities make them less attractive on large data sets. In this paper, we propose to scale up graph-based semisupervised learning using a set of sparse prototypes derived from the data. These prototypes serve as a small set of data representatives, which can be used to approximate the graph-based regularizer and to control model complexity. Consequently, both training and testing become much more efficient. Moreover, when the Gaussian kernel is used to define the graph affinity, a simple and principled method to select the prototypes can be obtained. Experiments on a number of real-world data sets demonstrate encouraging performance and scaling properties of the proposed approach. It also compares favorably with models learned via l1 -regularization at the same level of model sparsity. These results demonstrate the efficacy of the proposed approach in producing highly parsimonious and accurate models for semisupervised learning.


Assuntos
Reconhecimento Automatizado de Padrão/métodos , Aprendizado de Máquina Supervisionado , Máquina de Vetores de Suporte , Conjuntos de Dados como Assunto/tendências , Reconhecimento Automatizado de Padrão/tendências , Aprendizado de Máquina Supervisionado/tendências , Máquina de Vetores de Suporte/tendências
13.
IEEE Trans Neural Netw Learn Syst ; 26(1): 152-64, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25312945

RESUMO

The Nyström method is an efficient technique for the eigenvalue decomposition of large kernel matrices. However, to ensure an accurate approximation, a sufficient number of columns have to be sampled. On very large data sets, the singular value decomposition (SVD) step on the resultant data submatrix can quickly dominate the computations and become prohibitive. In this paper, we propose an accurate and scalable Nyström scheme that first samples a large column subset from the input matrix, but then only performs an approximate SVD on the inner submatrix using the recent randomized low-rank matrix approximation algorithms. Theoretical analysis shows that the proposed algorithm is as accurate as the standard Nyström method that directly performs a large SVD on the inner submatrix. On the other hand, its time complexity is only as low as performing a small SVD. Encouraging results are obtained on a number of large-scale data sets for low-rank approximation. Moreover, as the most computational expensive steps can be easily distributed and there is minimal data transfer among the processors, significant speedup can be further obtained with the use of multiprocessor and multi-GPU systems.

14.
IEEE Trans Neural Netw Learn Syst ; 26(9): 1927-38, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25343772

RESUMO

Nonparametric kernel learning (NPKL) is a flexible approach to learn the kernel matrix directly without assuming any parametric form. It can be naturally formulated as a semidefinite program (SDP), which, however, is not very scalable. To address this problem, we propose the combined use of low-rank approximation and block coordinate descent (BCD). Low-rank approximation avoids the expensive positive semidefinite constraint in the SDP by replacing the kernel matrix variable with V(T)V, where V is a low-rank matrix. The resultant nonlinear optimization problem is then solved by BCD, which optimizes each column of V sequentially. It can be shown that the proposed algorithm has nice convergence properties and low computational complexities. Experiments on a number of real-world data sets show that the proposed algorithm outperforms state-of-the-art NPKL solvers.

15.
IEEE Trans Neural Netw Learn Syst ; 25(12): 2275-87, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25420248

RESUMO

In hierarchical classification, the output labels reside on a tree- or directed acyclic graph (DAG)-structured hierarchy. On testing, the prediction paths of a given test example may be required to end at leaf nodes of the label hierarchy. This is called mandatory leaf node prediction (MLNP) and is particularly useful, when the leaf nodes have much stronger semantic meaning than the internal nodes. However, while there have been a lot of MLNP methods in hierarchical multiclass classification, performing MLNP in hierarchical multilabel classification is difficult. In this paper, we propose novel MLNP algorithms that consider the global label hierarchy structure. We show that the joint posterior probability over all the node labels can be efficiently maximized by dynamic programming for label trees, or greedy algorithm for label DAGs. In addition, both algorithms can be further extended for the minimization of the expected symmetric loss. Experiments are performed on real-world MLNP data sets with label trees and label DAGs. The proposed method consistently outperforms other hierarchical and flat multilabel classification methods.

16.
Neural Netw ; 60: 17-24, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25108150

RESUMO

In online learning with kernels, it is vital to control the size (budget) of the support set because of the curse of kernelization. In this paper, we propose two simple and effective stochastic strategies for controlling the budget. Both algorithms have an expected regret that is sublinear in the horizon. Experimental results on a number of benchmark data sets demonstrate encouraging performance in terms of both efficacy and efficiency.


Assuntos
Algoritmos , Inteligência Artificial , Software
17.
IEEE Trans Neural Netw Learn Syst ; 23(9): 1436-47, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24807927

RESUMO

For high-dimensional data, it is often desirable to group similar features together during the learning process. This can reduce the estimation variance and improve the stability of feature selection, leading to better generalization. Moreover, it can also help in understanding and interpreting data. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is a recent sparse-modeling approach that uses a l1 -regularizer and a pairwise l∞-regularizer on the feature coefficients to encourage such feature grouping. However, computationally, its optimization procedure is very expensive. In this paper, we propose an efficient solver based on the accelerated gradient method. We show that its key proximal step can be solved by a highly efficient simple iterative group merging algorithm. Given d input features, this reduces the empirical time complexity from O(d(2) ~ d(5)) for the existing solvers to just O(d). Experimental results on a number of toy and real-world datasets demonstrate that OSCAR is a competitive sparse-modeling approach, but with the added ability of automatic feature grouping.

18.
IEEE Trans Neural Netw Learn Syst ; 23(3): 492-503, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24808554

RESUMO

Probabilistic principal component analysis (PPCA) is a popular linear latent variable model for performing dimension reduction on 1-D data in a probabilistic manner. However, when used on 2-D data such as images, PPCA suffers from the curse of dimensionality due to the subsequently large number of model parameters. To overcome this problem, we propose in this paper a novel probabilistic model on 2-D data called bilinear PPCA (BPPCA). This allows the establishment of a closer tie between BPPCA and its nonprobabilistic counterpart. Moreover, two efficient parameter estimation algorithms for fitting BPPCA are also developed. Experiments on a number of 2-D synthetic and real-world data sets show that BPPCA is more accurate than existing probabilistic and nonprobabilistic dimension reduction methods.

19.
IEEE Trans Neural Netw ; 22(2): 199-210, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21095864

RESUMO

Domain adaptation allows knowledge from a source domain to be transferred to a different but related target domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we first propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a reproducing kernel Hilbert space using maximum mean miscrepancy. In the subspace spanned by these transfer components, data properties are preserved and data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. Furthermore, in order to uncover the knowledge hidden in the relations between the data labels from the source and target domains, we extend TCA in a semisupervised learning setting, which encodes label information into transfer components learning. We call this extension semisupervised TCA. The main contribution of our work is that we propose a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation. We propose both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce the distance between domain distributions by projecting data onto the learned transfer components. Finally, our approach can handle large datasets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach are verified by experiments on five toy datasets and two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification.


Assuntos
Inteligência Artificial , Processamento Eletrônico de Dados/métodos , Redes Neurais de Computação , Algoritmos , Simulação por Computador/normas , Reconhecimento Automatizado de Padrão/métodos , Transferência de Experiência
20.
IEEE Trans Neural Netw ; 21(10): 1576-87, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20805054

RESUMO

Kernel (or similarity) matrix plays a key role in many machine learning algorithms such as kernel methods, manifold learning, and dimension reduction. However, the cost of storing and manipulating the complete kernel matrix makes it infeasible for large problems. The Nyström method is a popular sampling-based low-rank approximation scheme for reducing the computational burdens in handling large kernel matrices. In this paper, we analyze how the approximating quality of the Nyström method depends on the choice of landmark points, and in particular the encoding powers of the landmark points in summarizing the data. Our (non-probabilistic) error analysis justifies a "clustered Nyström method" that uses the k-means clustering centers as landmark points. Our algorithm can be applied to scale up a wide variety of algorithms that depend on the eigenvalue decomposition of kernel matrix (or its variant), such as kernel principal component analysis, Laplacian eigenmap, spectral clustering, as well as those involving kernel matrix inverse such as least-squares support vector machine and Gaussian process regression. Extensive experiments demonstrate the competitive performance of our algorithm in both accuracy and efficiency.


Assuntos
Algoritmos , Inteligência Artificial , Análise por Conglomerados , Análise dos Mínimos Quadrados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...